A Multilingual Method for Clause Splitting

نویسنده

  • Georgiana Puşcaşu
چکیده

This paper addresses the clause splitting problem and proposes a multilingual method for detecting clause boundaries in unrestricted texts. The method combines language independent machine learning techniques with language specific rules in order to take the first step in building the hierarchical structure of sentences. The results of a machine learning algorithm, trained on an annotated corpus, are processed by a rule-based module which deals with clause boundaries not included in the learning process. Formal indicators of coordination and subordination, together with verb type information (finite or non-finite) are used for identifying clause boundaries. The method was evaluated on Romanian and English and the F-measure for clause start detection is 95% for Romanian and 92% for English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anaphora - Clause Annotation and Alignment Tool

The paper presents Anaphora – an OS and language independent tool for clause annotation and alignment, developed at the Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences. The tool supports automated sentence splitting and alignment and modes for manual monolingual annotation and multilingual alignment of sentences and clauses. Anaphora has ...

متن کامل

Chinese Event Descriptive Clause Splitting with Structured SVMs

Chinese event descriptive clause splitting is the task of splitting a complex Chinese sentence into several clauses. In this paper, we present a discriminative approach for Chinese event descriptive clause splitting task. By formulating the Chinese clause splitting task as a sequence labeling problem, we apply the structured SVMs model to Chinese clause splitting. Compared with other two baseli...

متن کامل

Constraint Manipulation in SGGS

SGGS (Semantically-Guided Goal-Sensitive theorem proving) is a clausal theorem-proving method, with a seemingly rare combination of properties: it is first order, DPLL-style model based, semantically guided, goal sensitive, and proof confluent. SGGS works with constrained clauses, and uses a sequence of constrained clauses to represent a tentative model of the given set of clauses. A basic buil...

متن کامل

A hybrid method for clause splitting in unrestricted English texts

It is important to know the structure of the sentence for many NLP tasks. In this paper we propose a hybrid method for clause splitting in unrestricted English texts which requires less human work than existing approaches. The results of a machine learning algorithm, trained on an annotated corpus, are processed by a shallow rule-based module in order to improve the accuracy of the method. The ...

متن کامل

The importance of annotated corpora for NLP: the cases of anaphora resolution and clause splitting

In this paper we present two applications that depend on annotated corpora for their implementation, evaluation and improvement. The first is an automatic anaphora resolution system. After describing the algorithm we discuss the importance of corpora for the tasks of evaluation and automatic scoring and the development of a coreferentially annotated corpus. We go on to look ahead at the role of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003